Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

نویسندگان

  • Claudio Lucchese
  • Salvatore Orlando
  • Raffaele Perego
چکیده

In this paper we address the problem of mining frequent closed itemsets in a distributed setting. We figure out an environment where a transactional dataset is horizontally partitioned and stored in different sites. We assume that due to the huge size of datasets and privacy concerns dataset partitions cannot be moved to a centralized site where to materialize the whole dataset and perform the mining task. Thus it becomes mandatory to perform separate mining on each site, and then merge the local results do derive a global knowledge. This paper shows how frequent closed itemsets, mined independently in each site, can be merged in order to derive globally frequent closed itemsets. Unfortunately, such merging might produce a superset of all the frequent closed itemsets, while the associated supports could be smaller than the exact ones because some globally frequent closed itemsets might be not locally frequent in some partition. A post-processing phase is thus needed to compute exact global

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Closed Itemsets from Highly Distributed Repositories

In this paper we address the problem of mining frequent closed itemsets in a highly distributed setting like a Grid. The extraction of frequent (closed) itemsets is an important problem in Data Mining, and is a very expensive phase needed to extract from a transactional database a reduced set of meaningful association rules, typically used for Market Basket Analysis. We figure out an environmen...

متن کامل

DARCI: Distributed Association Rule Mining Utilizing Closed Itemsets

A distributed rule mining algorithm must minimize the communication cost to reduce the communication bandwidth use and to improve the scalability. There are a few distributed rule mining algorithms reported in the literature. In this paper we propose a new distributed association rule mining algorithm, called DARCI, which reduces the communication cost by up to 40% of the best known algorithm. ...

متن کامل

Mining Frequent Closed Itemsets with the Frequent Pattern List

The mining of the complete set of frequent itemsets will lead to a huge number of itemsets. Fortunately, this problem can be reduced to the mining of frequent closed itemsets (FCIs), which results in a much smaller number of itemsets. The approaches to mining frequent closed itemsets can be categorized into two groups: those with candidate generation and those without. In this paper, we propose...

متن کامل

CLOLINK: An Adapted Algorithm for Mining Closed Frequent Itemsets

Mining of the complete set of frequent itemsets will lead to a huge number of itemsets. Fortunately, this problem can be reduced to the mining of closed frequent itemsets, which results in a much smaller number of itemsets. Methods for efficient mining of closed frequent itemsets have been studied extensively by many researchers using various strategies to prove their efficiencies such as Aprio...

متن کامل

CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005